-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[NVPTX] Add prefetch tensormap variant #146203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-ir Author: Abhilash Majumder (abhilash1910) Changes[NVPTX] Add Prefetch tensormap intrinsics
For more information, refer to the PTX ISA for prefetch intrinsic : Prefetch Tensormap @durga4github @schwarzschild-radius Full diff: https://github.com/llvm/llvm-project/pull/146203.diff 4 Files Affected:
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index 11017fe4e01b4..ca951811b73dd 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -971,6 +971,9 @@ Syntax:
declare void @llvm.nvvm.prefetch.L1(ptr %ptr)
declare void @llvm.nvvm.prefetch.L2(ptr %ptr)
+ declare void @llvm.nvvm.prefetch.tensormap.p0(ptr %ptr)
+ declare void @llvm.nvvm.prefetch.tensormap.p4(ptr addrspace(4) %const_ptr)
+
declare void @llvm.nvvm.prefetch.global.L2.evict.normal(ptr addrspace(1) %global_ptr)
declare void @llvm.nvvm.prefetch.global.L2.evict.last(ptr addrspace(1) %global_ptr)
@@ -983,7 +986,10 @@ The '``@llvm.nvvm.prefetch.*``' and '``@llvm.nvvm.prefetchu.*``' intrinsic
correspond to the '``prefetch.*``;' and '``prefetchu.*``' family of PTX instructions.
The '``prefetch.*``' instructions bring the cache line containing the
specified address in global or local memory address space into the
-specified cache level (L1 or L2). The '`prefetchu.*``' instruction brings the cache line
+specified cache level (L1 or L2). If the '``.tensormap``' qualifier is specified then the
+prefetch instruction brings the cache line containing the specified address in the
+'``.const``' or '``.param memory``' state space for subsequent use by the '``cp.async.bulk.tensor``'
+instruction.The '`prefetchu.*``' instruction brings the cache line
containing the specified generic address into the specified uniform cache level.
If no address space is specified, it is assumed to be generic address. The intrinsic
uses and eviction priority which can be accessed by the '``.level::eviction_priority``' modifier.
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 0375f29ad8906..0678bba51e4a3 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -137,6 +137,7 @@
def llvm_global_ptr_ty : LLVMQualPointerType<1>; // (global)ptr
def llvm_shared_ptr_ty : LLVMQualPointerType<3>; // (shared)ptr
+def llvm_constant_ptr_ty: LLVMQualPointerType<4>; // (const)ptr
def llvm_local_ptr_ty : LLVMQualPointerType<5>; // (local)ptr
def llvm_tmem_ptr_ty : LLVMQualPointerType<6>; // (tensor memory)ptr
def llvm_shared_cluster_ptr_ty : LLVMQualPointerType<7>; // (shared_cluster)ptr
@@ -2092,6 +2093,9 @@ let IntrProperties = [IntrArgMemOnly, ReadOnly<ArgIndex<0>>, NoCapture<ArgIndex<
def int_nvvm_prefetch_local_ # level : Intrinsic<[], [llvm_local_ptr_ty]>;
}
+ def int_nvvm_prefetch_tensormap_p0: DefaultAttrsIntrinsic<[], [llvm_ptr_ty]>;
+ def int_nvvm_prefetch_tensormap_p4 : DefaultAttrsIntrinsic<[], [llvm_constant_ptr_ty]>;
+
foreach eviction_priority = ["evict_normal", "evict_last"] in
def int_nvvm_prefetch_global_L2_ # eviction_priority : Intrinsic<[], [llvm_global_ptr_ty]>;
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index cc1fd027d8515..8afc7063c363a 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -760,6 +760,15 @@ def PREFETCH_GLOBAL_L1 : PREFETCH_INTRS<"prefetch.global.L1">;
def PREFETCH_LOCAL_L1 : PREFETCH_INTRS<"prefetch.local.L1">;
def PREFETCH_GLOBAL_L2 : PREFETCH_INTRS<"prefetch.global.L2">;
def PREFETCH_LOCAL_L2 : PREFETCH_INTRS<"prefetch.local.L2">;
+def PREFETCH_CONST_TENSORMAP : BasicNVPTXInst<(outs), (ins ADDR:$addr),
+ "prefetch.const.tensormap",
+ [(int_nvvm_prefetch_tensormap_p4 addr:$addr)]>,
+ Requires<[hasPTX<80>, hasSM<90>]>;
+
+def PREFETCH_GENERIC_TENSORMAP : BasicNVPTXInst<(outs), (ins ADDR:$addr),
+ "prefetch.tensormap",
+ [(int_nvvm_prefetch_tensormap_p0 addr:$addr)]>,
+ Requires<[hasPTX<80>, hasSM<90>]>;
def PREFETCH_GLOBAL_L2_EVICT_NORMAL : BasicNVPTXInst<(outs), (ins ADDR:$addr),
"prefetch.global.L2::evict_normal",
diff --git a/llvm/test/CodeGen/NVPTX/prefetch.ll b/llvm/test/CodeGen/NVPTX/prefetch.ll
index a64e4fe7a508e..b63155ff49185 100644
--- a/llvm/test/CodeGen/NVPTX/prefetch.ll
+++ b/llvm/test/CodeGen/NVPTX/prefetch.ll
@@ -12,6 +12,9 @@ declare void @llvm.nvvm.prefetch.local.L2(ptr addrspace(5) %local_ptr)
declare void @llvm.nvvm.prefetch.L1(ptr %ptr)
declare void @llvm.nvvm.prefetch.L2(ptr %ptr)
+declare void @llvm.nvvm.prefetch.tensormap.p0(ptr %ptr)
+declare void @llvm.nvvm.prefetch.tensormap.p4(ptr addrspace(4) %const_ptr)
+
declare void @llvm.nvvm.prefetch.global.L2.evict.normal(ptr addrspace(1) %global_ptr)
declare void @llvm.nvvm.prefetch.global.L2.evict.last(ptr addrspace(1) %global_ptr)
@@ -67,6 +70,33 @@ define void @prefetch_(ptr %ptr) {
ret void
}
+
+define void @prefetch_generic_tensormap(ptr %ptr) {
+; CHECK-PTX64-LABEL: prefetch_generic_tensormap(
+; CHECK-PTX64: {
+; CHECK-PTX64-NEXT: .reg .b64 %rd<2>;
+; CHECK-PTX64-EMPTY:
+; CHECK-PTX64-NEXT: // %bb.0:
+; CHECK-PTX64-NEXT: ld.param.b64 %rd1, [prefetch_generic_tensormap_param_0];
+; CHECK-PTX64-NEXT: prefetch.tensormap [%rd1];
+; CHECK-PTX64-NEXT: ret;
+ tail call void @llvm.nvvm.prefetch.tensormap.p0(ptr %ptr)
+ ret void
+}
+
+define void @prefetch_const_tensormap(ptr addrspace(4) %const_ptr) {
+; CHECK-PTX64-LABEL: prefetch_const_tensormap(
+; CHECK-PTX64: {
+; CHECK-PTX64-NEXT: .reg .b64 %rd<2>;
+; CHECK-PTX64-EMPTY:
+; CHECK-PTX64-NEXT: // %bb.0:
+; CHECK-PTX64-NEXT: ld.param.b64 %rd1, [prefetch_const_tensormap_param_0];
+; CHECK-PTX64-NEXT: prefetch.const.tensormap [%rd1];
+; CHECK-PTX64-NEXT: ret;
+ tail call void @llvm.nvvm.prefetch.tensormap.p4(ptr addrspace(4) %const_ptr)
+ ret void
+}
+
define void @prefetchu_l1(ptr %ptr) {
; CHECK-PTX64-LABEL: prefetchu_l1(
; CHECK-PTX64: {
|
@llvm/pr-subscribers-backend-nvptx Author: Abhilash Majumder (abhilash1910) Changes[NVPTX] Add Prefetch tensormap intrinsics
For more information, refer to the PTX ISA for prefetch intrinsic : Prefetch Tensormap @durga4github @schwarzschild-radius Full diff: https://github.com/llvm/llvm-project/pull/146203.diff 4 Files Affected:
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index 11017fe4e01b4..ca951811b73dd 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -971,6 +971,9 @@ Syntax:
declare void @llvm.nvvm.prefetch.L1(ptr %ptr)
declare void @llvm.nvvm.prefetch.L2(ptr %ptr)
+ declare void @llvm.nvvm.prefetch.tensormap.p0(ptr %ptr)
+ declare void @llvm.nvvm.prefetch.tensormap.p4(ptr addrspace(4) %const_ptr)
+
declare void @llvm.nvvm.prefetch.global.L2.evict.normal(ptr addrspace(1) %global_ptr)
declare void @llvm.nvvm.prefetch.global.L2.evict.last(ptr addrspace(1) %global_ptr)
@@ -983,7 +986,10 @@ The '``@llvm.nvvm.prefetch.*``' and '``@llvm.nvvm.prefetchu.*``' intrinsic
correspond to the '``prefetch.*``;' and '``prefetchu.*``' family of PTX instructions.
The '``prefetch.*``' instructions bring the cache line containing the
specified address in global or local memory address space into the
-specified cache level (L1 or L2). The '`prefetchu.*``' instruction brings the cache line
+specified cache level (L1 or L2). If the '``.tensormap``' qualifier is specified then the
+prefetch instruction brings the cache line containing the specified address in the
+'``.const``' or '``.param memory``' state space for subsequent use by the '``cp.async.bulk.tensor``'
+instruction.The '`prefetchu.*``' instruction brings the cache line
containing the specified generic address into the specified uniform cache level.
If no address space is specified, it is assumed to be generic address. The intrinsic
uses and eviction priority which can be accessed by the '``.level::eviction_priority``' modifier.
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 0375f29ad8906..0678bba51e4a3 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -137,6 +137,7 @@
def llvm_global_ptr_ty : LLVMQualPointerType<1>; // (global)ptr
def llvm_shared_ptr_ty : LLVMQualPointerType<3>; // (shared)ptr
+def llvm_constant_ptr_ty: LLVMQualPointerType<4>; // (const)ptr
def llvm_local_ptr_ty : LLVMQualPointerType<5>; // (local)ptr
def llvm_tmem_ptr_ty : LLVMQualPointerType<6>; // (tensor memory)ptr
def llvm_shared_cluster_ptr_ty : LLVMQualPointerType<7>; // (shared_cluster)ptr
@@ -2092,6 +2093,9 @@ let IntrProperties = [IntrArgMemOnly, ReadOnly<ArgIndex<0>>, NoCapture<ArgIndex<
def int_nvvm_prefetch_local_ # level : Intrinsic<[], [llvm_local_ptr_ty]>;
}
+ def int_nvvm_prefetch_tensormap_p0: DefaultAttrsIntrinsic<[], [llvm_ptr_ty]>;
+ def int_nvvm_prefetch_tensormap_p4 : DefaultAttrsIntrinsic<[], [llvm_constant_ptr_ty]>;
+
foreach eviction_priority = ["evict_normal", "evict_last"] in
def int_nvvm_prefetch_global_L2_ # eviction_priority : Intrinsic<[], [llvm_global_ptr_ty]>;
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index cc1fd027d8515..8afc7063c363a 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -760,6 +760,15 @@ def PREFETCH_GLOBAL_L1 : PREFETCH_INTRS<"prefetch.global.L1">;
def PREFETCH_LOCAL_L1 : PREFETCH_INTRS<"prefetch.local.L1">;
def PREFETCH_GLOBAL_L2 : PREFETCH_INTRS<"prefetch.global.L2">;
def PREFETCH_LOCAL_L2 : PREFETCH_INTRS<"prefetch.local.L2">;
+def PREFETCH_CONST_TENSORMAP : BasicNVPTXInst<(outs), (ins ADDR:$addr),
+ "prefetch.const.tensormap",
+ [(int_nvvm_prefetch_tensormap_p4 addr:$addr)]>,
+ Requires<[hasPTX<80>, hasSM<90>]>;
+
+def PREFETCH_GENERIC_TENSORMAP : BasicNVPTXInst<(outs), (ins ADDR:$addr),
+ "prefetch.tensormap",
+ [(int_nvvm_prefetch_tensormap_p0 addr:$addr)]>,
+ Requires<[hasPTX<80>, hasSM<90>]>;
def PREFETCH_GLOBAL_L2_EVICT_NORMAL : BasicNVPTXInst<(outs), (ins ADDR:$addr),
"prefetch.global.L2::evict_normal",
diff --git a/llvm/test/CodeGen/NVPTX/prefetch.ll b/llvm/test/CodeGen/NVPTX/prefetch.ll
index a64e4fe7a508e..b63155ff49185 100644
--- a/llvm/test/CodeGen/NVPTX/prefetch.ll
+++ b/llvm/test/CodeGen/NVPTX/prefetch.ll
@@ -12,6 +12,9 @@ declare void @llvm.nvvm.prefetch.local.L2(ptr addrspace(5) %local_ptr)
declare void @llvm.nvvm.prefetch.L1(ptr %ptr)
declare void @llvm.nvvm.prefetch.L2(ptr %ptr)
+declare void @llvm.nvvm.prefetch.tensormap.p0(ptr %ptr)
+declare void @llvm.nvvm.prefetch.tensormap.p4(ptr addrspace(4) %const_ptr)
+
declare void @llvm.nvvm.prefetch.global.L2.evict.normal(ptr addrspace(1) %global_ptr)
declare void @llvm.nvvm.prefetch.global.L2.evict.last(ptr addrspace(1) %global_ptr)
@@ -67,6 +70,33 @@ define void @prefetch_(ptr %ptr) {
ret void
}
+
+define void @prefetch_generic_tensormap(ptr %ptr) {
+; CHECK-PTX64-LABEL: prefetch_generic_tensormap(
+; CHECK-PTX64: {
+; CHECK-PTX64-NEXT: .reg .b64 %rd<2>;
+; CHECK-PTX64-EMPTY:
+; CHECK-PTX64-NEXT: // %bb.0:
+; CHECK-PTX64-NEXT: ld.param.b64 %rd1, [prefetch_generic_tensormap_param_0];
+; CHECK-PTX64-NEXT: prefetch.tensormap [%rd1];
+; CHECK-PTX64-NEXT: ret;
+ tail call void @llvm.nvvm.prefetch.tensormap.p0(ptr %ptr)
+ ret void
+}
+
+define void @prefetch_const_tensormap(ptr addrspace(4) %const_ptr) {
+; CHECK-PTX64-LABEL: prefetch_const_tensormap(
+; CHECK-PTX64: {
+; CHECK-PTX64-NEXT: .reg .b64 %rd<2>;
+; CHECK-PTX64-EMPTY:
+; CHECK-PTX64-NEXT: // %bb.0:
+; CHECK-PTX64-NEXT: ld.param.b64 %rd1, [prefetch_const_tensormap_param_0];
+; CHECK-PTX64-NEXT: prefetch.const.tensormap [%rd1];
+; CHECK-PTX64-NEXT: ret;
+ tail call void @llvm.nvvm.prefetch.tensormap.p4(ptr addrspace(4) %const_ptr)
+ ret void
+}
+
define void @prefetchu_l1(ptr %ptr) {
; CHECK-PTX64-LABEL: prefetchu_l1(
; CHECK-PTX64: {
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest revision lgtm
✅ With the latest revision this PR passed the C/C++ code formatter. |
BasicNVPTXInst<(outs), (ins ADDR:$addr), | ||
InstName, | ||
[(!cast<Intrinsic>(!strconcat("int_nvvm_", | ||
!subst(".", "_", InstName))) addr:$addr)]>, | ||
Requires<[hasPTX<80>, hasSM<90>]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have these predicates been removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this needs to be added, fixing in progress. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this, will update. thanks
@@ -587,6 +588,9 @@ Value *NVPTXTTIImpl::rewriteIntrinsicWithAddressSpace(IntrinsicInst *II, | |||
return ConstantInt::get(II->getType(), *R); | |||
return nullptr; | |||
} | |||
case Intrinsic::nvvm_prefetch_tensormap: { | |||
return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order for this to be useful we need to actually rewrite the intrinsic here. That is, create an nvvm_prefetch_tensormap
intrinsic with NewV. It would also be good to add a couple tests to confirm this works as expected.
Module *M = II->getModule(); | ||
Function *NewIntrinsic = Intrinsic::getDeclaration( | ||
M, Intrinsic::nvvm_prefetch_tensormap, {NewV->getType()}); | ||
CallInst *NewCall = Builder.CreateCall(NewIntrinsic, {NewV}); | ||
NewCall->setAttributes(II->getAttributes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Builder.CreateUnaryIntrinsic
BasicNVPTXInst<(outs), (ins ADDR:$addr), | ||
InstName, | ||
[(!cast<Intrinsic>(!strconcat("int_nvvm_", | ||
!subst(".", "_", InstName))) addr:$addr)]>, | ||
Requires<[hasPTX<80>, hasSM<90>]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been fixed?
def "" : BasicNVPTXInst<(outs), (ins ADDR:$addr), | ||
!strconcat("prefetch", !if(!eq(addrspace_name, ""), "", | ||
!strconcat(".", addrspace_name)), ".tensormap [$addr];"), | ||
[(pattern_frag addr:$addr)]>, | ||
Requires<[hasPTX<80>, hasSM<90>]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use #
instead of !strconcat(
and remove " [$addr];"
as this will be added automatically by BasicNVPTXInst. I think this could be simplified by passing ".param" and ".const" for addrspace_name.
defvar frag_pat = (int_nvvm_prefetch_tensormap node:$addr); | ||
|
||
def prefetch_tensormap_const : PatFrag<!setdagop(frag_pat, ops), frag_pat, AS_match.const>; | ||
def prefetch_tensormap_gen : PatFrag<!setdagop(frag_pat, ops), frag_pat, AS_match.generic>; | ||
def prefetch_tensormap_param : PatFrag<!setdagop(frag_pat, ops), frag_pat, AS_match.param>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this part of the multiclass and just pass AS_match.const/generic/param
as a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yes.
[NVPTX] Add Prefetch tensormap intrinsics
This PR adds prefetch intrinsics with the relevant tensormap_space.
For more information, refer to the PTX ISA for prefetch intrinsic : Prefetch Tensormap
@durga4github @schwarzschild-radius